correct prediction
Disproving the Feasibility of Learned Confidence Calibration Under Binary Supervision: An Information-Theoretic Impossibility
Nair, Arjun S., Sinaga, Kristina P.
We prove a fundamental impossibility theorem: neural networks cannot simultaneously learn well-calibrated confidence estimates with meaningful diversity when trained using binary correct/incorrect supervision. Through rigorous mathematical analysis and comprehensive empirical evaluation spanning negative reward training, symmetric loss functions, and post-hoc calibration methods, we demonstrate this is an information-theoretic constraint, not a methodological failure. Our experiments reveal universal failure patterns: negative rewards produce extreme underconfidence (ECE greater than 0.8) while destroying confidence diversity (std less than 0.05), symmetric losses fail to escape binary signal averaging, and post-hoc methods achieve calibration (ECE less than 0.02) only by compressing the confidence distribution. We formalize this as an underspecified mapping problem where binary signals cannot distinguish between different confidence levels for correct predictions: a 60 percent confident correct answer receives identical supervision to a 90 percent confident one. Crucially, our real-world validation shows 100 percent failure rate for all training methods across MNIST, Fashion-MNIST, and CIFAR-10, while post-hoc calibration's 33 percent success rate paradoxically confirms our theorem by achieving calibration through transformation rather than learning. This impossibility directly explains neural network hallucinations and establishes why post-hoc calibration is mathematically necessary, not merely convenient. We propose novel supervision paradigms using ensemble disagreement and adaptive multi-agent learning that could overcome these fundamental limitations without requiring human confidence annotations.
Rashomon in the Streets: Explanation Ambiguity in Scene Understanding
Spieker, Helge, Betten, Jรธrn Eirik, Gotlieb, Arnaud, Lazaar, Nadjib, Belmecheri, Nassim
Explainable AI (XAI) is essential for validating and trusting models in safety-critical applications like autonomous driving. However, the reliability of XAI is challenged by the Rashomon effect, where multiple, equally accurate models can offer divergent explanations for the same prediction. This paper provides the first empirical quantification of this effect for the task of action prediction in real-world driving scenes. Using Qualitative Explainable Graphs (QXGs) as a symbolic scene representation, we train Rashomon sets of two distinct model classes: interpretable, pair-based gradient boosting models and complex, graph-based Graph Neural Networks (GNNs). Using feature attribution methods, we measure the agreement of explanations both within and between these classes. Our results reveal significant explanation disagreement. Our findings suggest that explanation ambiguity is an inherent property of the problem, not just a modeling artifact.
ALSA: Anchors in Logit Space for Out-of-Distribution Accuracy Estimation
Liu, Chenzhi, Baktashmotlagh, Mahsa, Tang, Yanran, Huang, Zi, Qiu, Ruihong
Estimating model accuracy on unseen, unlabeled datasets is crucial for real-world machine learning applications, especially under distribution shifts that can degrade performance. Existing methods often rely on predicted class probabilities (softmax scores) or data similarity metrics. While softmax-based approaches benefit from representing predictions on the standard simplex, compressing logits into probabilities leads to information loss. Meanwhile, similarity-based methods can be computationally expensive and domain-specific, limiting their broader applicability. In this paper, we introduce ALSA (Anchors in Logit Space for Accuracy estimation), a novel framework that preserves richer information by operating directly in the logit space. Building on theoretical insights and empirical observations, we demonstrate that the aggregation and distribution of logits exhibit a strong correlation with the predictive performance of the model. To exploit this property, ALSA employs an anchor-based modeling strategy: multiple learnable anchors are initialized in logit space, each assigned an influence function that captures subtle variations in the logits. This allows ALSA to provide robust and accurate performance estimates across a wide range of distribution shifts. Extensive experiments on vision, language, and graph benchmarks demonstrate ALSA's superiority over both softmax- and similarity-based baselines. Notably, ALSA's robustness under significant distribution shifts highlights its potential as a practical tool for reliable model evaluation.